For the following exercises, we will again use data on population from Gapminder.

As per usual, we first need to read in the data. You can just copy, paste and run the following code in(to) your script.

library(readr)
library(dplyr)

gap_pop <- read_csv("../data/gapminder/population_total.csv") %>% 
  rename(country = "Total population")

Again, the data are currently in wide format.

1

Select only data for the 20th century, but this time use a helper function instead of specifying a range of columns.
The helper function you should use here is starts_with(). We also want to keep the country column.

As you may have already noticed, the dataset comprises some missing data points. Before we start analyzing the data, we might want to know for how many countries we have complete data.

2

Using the dataset in wide format, find out for how many countries we have complete data?
To answer this question you should use the drop_na() function from tidyr.

As in the previous set of data wrangling exercises, we now want to transform the data into the long format.

3

Transform the gap_pop dataset into a sensible long format. Name the variable representing the values for population pop and store the resulting dataframe in a name with the same object as before (gap_pop). Also change the type of the year variable to integer.
This is just a repetition from the Tidy Data exercises. What we want to do is to gather the columns with the years into a year variable. To change the variable type, you need to use mutate().

Now let’s apply some of the advanced filtering options we discussed in the Data Wrangling - Part 2 session.

4

Create two new dataframes that include different subets of the gap_pop data:

  1. Data for all countries for the 1990s (name this one gap_pop_1990s),

  2. Data for all years but only for Germany (name this one gap_pop_ger).

NB: There are different Germanies in the dataset: West Germany. East Germany, and Germany.
You need to use a helper function from dplyr to create the first new data frame and a specific matching operator to create the second one.

For some comparisons (especially via plots), it might help to know which continent the country is located on. For this purpose, we will create a new continent variable. As it would be quite tedious to create this variable manually for all of the countries in the dataset, we will do this only for a subset in this exercise. Just run the following code in your local script to create this subset.

gap_pop_subset <- gap_pop %>% 
  filter(country %in% 
           c("Netherlands", "Brazil", "China", "Algeria", "New Zealand"))

5

Create a continent variable for the countries in gap_pop_subset. The variable should be a factor and its values the following: Africa, Americas, Asia, Europe, Oceania.
You can use recode_factor() to create the new variable. Alternatively, you could also use case_when() here. However, the latter would require more typing which is something that we generally want to avoid.